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(54) Method and apparatus for accessing a memory core 



(57) An apparatus and method for using self-timing 
logic to make at least two accesses to a memory core 
in one clock cycle is disclosed. In one embodiment of 
the invention, a memory wrapper (28) incorporating self- 
timing logic - and a mux - is used to couple a single ac- 
cess memory core (30) to a memory interface unit (10). 
The memory interface unit (10) couples a central 
processing unit (12) to the memory wrapper (28). The 
self -timing architecture as applied to multi-access mem- 
ory wrappers avoids the need for calibration. Moreover, 
the self-timing architecture provides for a full dissocia- 
tion between the environment (what is clocked on the 
system clock) and the access to the core. A beneifical 
result of the invention is making access at the speed of 
the core while processing several access in one system 
clock cycle. In accordance with another aspect of the 
invention, the apparatus and method for using self-tim- 
ing logic to make at least two accesses to a memory 
core in one clock cycle is incorporated into a data 



processing system, such as a digital signal processor 
(DSP). In another embodiment of the invention, a mem- 
ory core (26 embodied within RAM) incorporating the 
self-timing architecture is incorporated directly into the 
processor core thereby avoiding the need for a memory 
wrapper and the time delay associated with passing in- 
formation from the processor core via the memory int r- 
face unit and to the memory core. Direct incorporation 
of a memory core into the processor core facilitates 
more intensive accessing and additional power savings. 
In accordance with yet another aspect of the invention, 
the apparatus and method for using self-timing logic to 
make at least two accesses to a memory core in one 
clock cycle is incorporated into a data processing sys- 
tem, such as a digital signal processor (DSP) - is further 
incorporated into an electronic computing system, such 
as a digital cellular telephone handset. 
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Description 

FIELD OF THE INVENTION 

[0001] The present invention r lates to th field of 
processors, in particular, but not exclusively, to digital 
signal processors and signal processing systems and 
to a method and apparatus for accessing a memory core 
multiple time in a single clock cycle. 

BACKGROUND OF THE INVENTION 

[0002] Signal processing generally refers to the per- 
formance of real-time operations on a data stream. Ac- 
cordingly, typical signal processing applications include 
or occur in telecommunications, image processing, 
speech processing and generation, spectrum analysis 
and audio processing and filtering. In each of these ap- 
plications, the data stream is generally continuous. 
Thus, the signal processor must produce results, 
through-put", at the maximum rate of the data stream. 
[0003] Conventionally, both analog and digital sys- 
tems have been utilized to perform many signal 
processing functions. Analog signal processors, though 
typically capable of supporting higher through-put rates, 
are generally limited in terms of their long term accuracy 
and the complexity of the functions that they can per- 
form. In addition, analog signal processing systems are 
typically quite inflexible once constructed and, there- 
fore, best suited only to singular application anticipated 
in their initial design. 

[0004] A digital signal processor provides the oppor- 
tunity for enhanced accuracy and flexibility in the per- 
formance of operations that are very difficult, if not im- 
practicably complex, to perform in an analog system. 
Additionally, digital signal processor systems typically 
offer a greater degree of post-construction flexibility 
than their analog counterparts, thereby permitting more 
functionally extensive modifications to be made for sub- 
sequent utilization in a wider variety of applications. 
Consequently, digital signal processing is preferred in 
many applications. 

[0005] Within a digital signal processor, a memory 
wrapper is an interface between a memory core and a 
sea of gates. A combination of a memory core and a 
memory wrapper can be considered a memory module. 
In FIG. 1 , a memory interface (10) couples a CPU (12) 
to a single access memory module (14). Memory mod- 
ule (14) comprises a single bus (16) coupling a single 
access memory core (18) to a memory wrapper (20). 
Multiple buses (22) couple memory wrapper (20) to 
memory interface (10). In a single access memory mod- 
ule, such as memory module (14), only one access is 
performed in on cycl . In this embodim nt, a syst m 
clock typically s rvesasth strobe of them morycor 
and them morywrapp r serves solely as a bus arbitra- 
tor that allows a CPU to p rform a singl access to the 
memory core in on cycl . 



SUMMARY OF THE INVENTION 

[0006] In accordance with a first aspect of th inv n- 
tion, there is provided an apparatus and/or method for 

5 using s If-timing logic to mak at least two accesses to 
a memory core in one clock cycle. In one embodiment 
of the invention a memory wrapper incorporating self- 
timing logic and a mux(es) is used to couple a multiple 
access memory core to a memory interface unit. The 

io memory interface unit couples a central processing unit 
to the memory wrapper. The self-timing architectur as 
applied to multi-access memory wrappers avoids the 
need for calibration. Moreover, the self-timing archit c- 
ture provides for a full dissociation between the environ- 

*s ment (what is clocked on the system clock) and the ac- 
cess to the core. A beneficial result of the invention is 
making access at the speed of the core while processing 
several access in one system clock cycle. 
[0007] In another embodiment of the invention, a 

20 memory core incorporating self-timing architecture is in- 
corporated directly into the processor core thereby 
avoiding the need for a memory wrapper and the time 
delay associated with passing information from the 
processor core via the memory interface unit and to the 

2S memory core. Direct incorporation of a memory core into 

the processor core facilitates more intensive accessing ! 
and additional power savings. 

[0008] In accordance with a second aspect of the in- 
vention, the apparatus and/or method for using self-tim- 
30 ing logic to make at least two accesses to a memory 
core in one clock cycle is incorporated into a data 
processing system, such as a digital signal processor 
(DSP). 

[0009] In accordance with a third aspect of the inven- 
ts tion, the apparatus and/or method for using se If-timing 

logic to make at least two accesses to a memory core \ 
in one clock cycle is incorporated into a data processing j 
system, such as a digital signal processor (DSP) which 
is further incorporated into an electronic computing sys- \ 
40 tern, such as a digital cellular telephone handset. \ 

\ 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0010] Preferred embodiments of the invention will | 
45 now be described, by way of example only, and with ref- 5 
erence to the accompanying drawings, in which: , 
[0011] FIG. 1 is a block diagram of a prior art data I 
processing system having a single access memory ] 
core. i 
so [0012] FIG. 2 is a block diagram of a data processing i 
system according to one embodiment of the invention. { 
[001 3] FIG. 3 is is a timing diagram illustrating the sig- 
nal exchange between the environment, the memory ^ 
wrapper and th memory cor . I 
55 [0014] FIG. 4 is a block diagram of a memory core \ 
and circuitry for introducing delay or "calibration" be- I 
tw nth rising edg of the clock and the control of th : 
mux. 
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[0015] FIG. 5 is a block diagram of a memory core 
and circuitry for facilitating multiple accesses to a mem- 
ory core in a single cycle, according to another embod- 
iment of the invention. 

[001 6] FIG. 6 is is a timing diagram illustrating the sig- 
nal exchange between the environment, the memory 
wrapper and the memory core, that implements self -tim- 
ing logic for switching data that must be written into the 
memory core, according to an embodiment of the inven- 
tion. 

[0017] FIG. 7 is is a timing diagram illustrating the sig- 
nal exchange between the environment, the memory 
wrapper and the memory core, that implements self -tim- 
ing logic for latching data that are output from the mem- 
ory core, according to an embodiment of the invention. 
[0018] FIG. 8 is is a timing diagram illustrating the sig- 
nal exchange between the environment, the memory 
wrapper and the memory core, in an embodiment of the 
invention that permits triple access to the memory core 
in one cycle. 

[0019] FIG. 9 is a schematic block diagram of a proc- 
essor in accordance with an embodiment of the inven- 
tion. 

[0020] FIG. 10 is a schematic block diagram illustrat- 
ing how the four main elements of the core processor of 
FIG. 9 are coupled to multiple access memory 26. 
[0021] FIG. 11 is a schematic block diagram illustrat- 
ing a P Unit, A Unit and D Unit of the core processor of 
FIG. 10, 

[0022] FIG, 1 2 is a schematic illustration of the oper- 
ation of an I Unit of the core processor of FIG. 10. 
[0023] FIG. 13 is a diagrammatic illustration of the 
pipeline stages for the core processor of FIG. 10. 
[0024] FIG. 14 is a diagrammatic illustration of 
stateges of a thread through the pipieline of the proces- 
sor of FIG. 9. 

[0025] FIG. 15 illustrates a technique for coupling 
multiple access memory 26 to memory interface unit 48. 
[0026] FIG. 16 illustrates an optional embodiment of 
a processor core in which multiple access memory 26 
is incorporated into the processor core. 
[0027] FIG. 17 is a schematic illustration of a digital 
signal processor (DSP), in which a memory core and 
circuitry for facilitating multiple accesses to a memory 
core in a single cycle, according to another embodiment 
of the invention. 

[0028] FIG. 18 is a schematic illustration of an exem- 
plary battery powered computing system, implemented 
as a wireless telephone, including the DSP of FIG. 15, 
according to a preferred embodiment of the invention. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

[0029] Animprovem ntoverth singl access mem- 
ory module shown in FIG. 1 is a multi-access memory 
module, in which several access s can be performed in 
one cycl . FIG. 2 illustrat s a multi-access memory 



module 26 according to a preferred embodim nt of the 
inv ntion. A memory interface unit 10 coupl s a CPU 
12 to a mufti-access memory module 26. Multi-access 
memory module 26 compris s a memory wrapp r 28 

s coupling memory interface unit 10 to single-access 
memory core 30 (in this particular case multi-access 
memory module is a dual-access RAM). Coupling of 
memory wrapper 28 to memory core 30 is provided by 
an address bus (ADDR), a data in bus (d IN), a data out 

io bus (d OUT), a first signal line for an access ready signal 
(accrdy), a second signal line for an output ready signal 
(ordy), at least two signal lines for strobe signals (three 
shown: strobe 1 ; strobe 2; and strobe 3) 
[0030] Multi-accessing within a single cycle faces 

15 problems not associated with single accessing. One 
problem is determining how to sequence the accesses 
in one cycle. 

[0031] Another problem is determining what signal 
can be used to change the data at the boundary of a 

20 multi-access ram memory core. Embodiments of the 
present invention overcome both of these problems. 
FIG. 3 is a timing diagram illustrating the signal ex- 
change between the environment (CPU 12 & memory 
interface unit 10, the memory wrapper 28 and the mem- 

25 ory core 30 in a LEAD 3 Megacell designed and pro- 
duced by Texas Instruments Incorporated (describ d in 
more detail later). In a dual-access environment, th re 
are two accesses to the memory core in one cycle. Th 
memory module is accessed by buses C and D while 

50 the addresses of buses A and B are temporarily dis- 
patched to the memory core. As illustrated in FIG. 3 the 
value on the "A address bus" must be held at the bound- 
ary of the core until the hold (1 ) time is achieved before 
the "B address bus" is presented to the core. Accord- 

35 ingly, there is a need to switch a mux (not shown) within 
memory wrapper 28 at the end of the hold time. To attain 
this result, it is necessary to create a delay between the 
rising edge of the clock and the control of the mux, FIG. 
4 illustrates one technique for creating the desired delay 

40 34, which is also referred to as "calibration". Unfortu- 
nately, the approach disclosed in FIG. 4 makes the de- 
sign synthcsizable only with high difficulty because no 
synthesizer can certify a minimum delay on a path. 
[0032] The inclusion of self-timed logic 36 in wrap- 

45 pers, as illustrated in FIG. 5, overcomes the high diffi- 
culty aspect of making the design synthesizable. The 
self-timed logic delivers signals when an action can oc- 
cur. As an example, the self-timed logic of the memory 
core 30 can produce a signal (accrdy) to indicate, "the 

50 hold time on the address bus is achieved, it is possible 
to present a new address on the bus". The mux will 
switch the address bus as soon as the core can acc pt 
another address. As a result, there is no need to cafi- 
brate anything because the hold time on th core ad- 

55 dress bus will be given by construction. To be more pr - 
cise concerning the functioning of th 'logic", th mux 
will switch using "accrdy" if sev ral acc sses are linked 
up and a system clock is used in th case of the first 
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access becaus "accrdy" has not been generated yet. 
The A bus address is switched using a system clock 
while the 8 address bus is switched using th "accrdy" 
signal. In a dual-acc ss ram implementation of a mem- 
ory core, such as Texas Instruments' LEAD 3 Megacell, $ 
a multistrobe core is used with strobe 1 being the system 
clock and strobe 2 being "not system clock", as illustrat- 
ed in FIG. 6 

[0033] in addition to being used for addressing, the 
self-timing logic is used for switching data that must be 10 
written in the memory core. Thus, the same process is 
used to latch the data that are output from the memory 
core. As an example, the self-timed signal "ordy" (output 
ready that is active low) can be used to latch the valid 
data from the core. In such an implementation, it is not 1$ 
necessary to use the system clock to latch the output 
data, as illustrated in FIG. 7. Moreover, using the access 
ready "accrdy" and the output ready "ordy" self-timing 
signals, rt is possible to link up more than 2 access in a 
single cycle of the clock period if we assume for example 20 
that the signification of the rising edge of the "ordy" is 
the end of the cycle time of the memory. FIG. 8 illustrates 
the timing diagram of a triple access in one cycle. The 
system clock initializes the process after which the self- 
timing logic can link up accesses by itself without the 2s 
help of the system clock. As a result, the accesses to 
the core are fully decorelated from the system clock. 
[0034] The self -timing architecture of embodiments of 
the present invention as applied to memory wrappers 
reduces and preferably avoids calibration problems. 30 
Moreover, the self-timing logic of embodiments of the 
present invention facilitates a full dissociation between 
the environment (what is clocked on the system clock) 
and the access to the core. A direct application is to 
make accesses at the speed of the core to process sev- 3$ 
eral accesses in one system clock cycle. 
[0035] The basic architecture of an example of a proc- 
essor according to the invention will now be described. 
[0036] Figure 9 is a schematic overview of a proces- 
sor 40 (in this particular embodiment a LEAD 3 Megacell 40 
manufactured by Texas Instruments Incorporated) in- 
corporating an apparatus for applying self-timing logic 
to a multi-access memory wrapper in accordance with 
a preferred embodiment of the present invention. The 
processor includes a processing engine 42 and a proc- *5 
essor backplane 44. In a particular example of the in- 
vention, the processor is a Digital Signal Processor im- 
plemented in an Application Specific Integrated Circuit 
(ASIC) which together form a digital signal processor 
Megacell. As shown in Figure 9, the processing engine s° 
42 forms a central processing unit (CPU) with a process- 
ing core 46 and a memory interface unit 48 for interfac- 
ing the processing core 46 with memory units external 
to th processor core 46. 

[0037] The processor backplan 44 comprises a 55 
backplane bus 50, to which the memory manag ment 
unit 48 of the processing engine is connected. Also con- 
nected to the backplane bus 50 is an instruction cache 



memory 52, peripheral devices 54 and an xtemal in- 
terface 56. It will be appreciated that in other examples, 
the invention could be impl mented using diff r nt con- 
figurations and/or diff r nt technologies. For example, 
the processing engine 42 could form the processor 40, 
with the processor backplane 44 being separate there- 
from. The processing engine 42 could, for example be 
a DSP separate from and mounted on a backplane 44 
supporting a backplane hus 50, peripheral and external 
interlaces. The processing engine 42 could, for exam- 
ple, be a microprocessor rather than a DSP and could 
be implemented in technologies other than ASIC tech- 
nology. The processing engine or a processor including 
the processing engine could be implemented in one or 
more integrated circuits. 

[0038] Figure 10 illustrates the basic structure of an 
embodiment of the processor core 46. As illustrated, this 
embodiment of processor core 46 includes four ele- 
ment, namely an instruction Buffer Unit (I Unit) 58 and 
three execution elements are coupled to multi-acc ss 
memory 26. The execution units are a Program Flow 
Unit (P Unit) 60, Address Data Flow Unit (A Unit) 62 and 
a Data Computation Unit (D Unit) 64 for executing in- 
structions decoded from the Instruction Buffer Unit (I 
Unit) 58 and for controlling and monitoring program flow. 
[0039] Figure 1 1 illustrates the execution units P Unit 
60, A Unit 62 and D Unit 64 of the processing core 46 
in more detail and shows the bus structure connecting 
the various elements of the processing core 46. The P 
Unit 60 includes, for example, loop control circuitry, 
GoTo/Branch control circuitry and various registers for 
controlling and monitoring program flow such as repeat 
counter registers and interrupt mask, flag or vector reg- 
isters. The P Unit 60 is coupled to general purpose Data 
Write busses (EB.FB) 66, 68, Data Read busses (CB, 
DB) 70, 72 and a coefficient program bus (BB) 74. Ad- 
ditionally, the P Unit 60 is coupled to sub-units within the 
A Unit 62 and D Unit 64 via various busses such as CSR, 
ACB and RGD, the description and relevance of which 
will be discussed hereinafter as and when necessary in 
relation to particular aspects of embodiments in accord- 
ance with the invention. 

[0040] As illustrated in Figure 11, in the present em- 
bodiment the A Unit 62 includes three sub-units, namely 
a register file 76, a data address generation sub-unit 
(DAG EN) 78 and an Arithmetic and Logic Unit (ALU) 80. 
The A Unit register file 72 includes various registers, 
among which are 16 bit pointer registers (ARO,...AR7) 
and data registers (DRO....DR3) which may also be 
used for data flow as well as address generation. Addi- 
tionally, the register file includes 16 bit circular buffer 
registers and 7 bit data page registers. As well as the 
general purpose busses (EB,FB,CB,DB) 66. 68, 70, 72, 
a co ffici nt data bus 82 and a coefficient address bus 
84 are coupled to the A Unit reglst r file 72. Th A Unit 
r gist r file 72 is coupled to the A Unit DAG EN unit 78 
by unidirectional bus s 86 and 88 respectively op rating 
in opposit directions. Th DAG EN unit 78 includ s 16 
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bit X/Y registers and coefficient and stack pointer regis- 
ters, for example for controlling and monitoring address 
generation within th processing ngin 42. 
[0041] The A Unit 62 also comprises a third unit, the 
ALU 80 which includes a shifter function as well as the 5 
functions typically associated with an ALU such as ad- 
dition, subtraction, and AND, OR and XOR logical op- 
erators. The ALU 80 is also coupled to the general pur- 
pose buses (EB.DB) 66.72 and an instruction constant 
data bus (KDB) 82. The A Unit ALU is coupled to the P u> 
Unit 60 by a PDA bus for receiving register content from 
the P Unit 60 register file. The ALU 80 is also coupled 
to the A Unit register file 72 by busses RGA and RGB 
for receiving address and data register contents and by 
a bus RGD for forwarding address and data registers in 
the register file 72. 

[0042] In accordance with the illustrated embodiment 
of the invention D Unit 64 includes five elements, namely 
a D Unit register file 90, a D Unit ALU 92, a D Unit shifter 
94 and two Multiply and Accumulate units 20 
(MAC1.MAC2) 96 and 98. The D Unit register file 90, D 
Unit ALU 92 and D Unit shifter 94 are coupled to buses 
(EB.FB.CB.DB and KDB) 66, 68, 70, 72 and 82, and the 
MAC units 96 and 98 are coupled to the buses (CB.DB, 
KDB) 70, 72, 82, and Data Read bus (BB) 86. The D & 
Unit register file 90 includes 40-bit accumulators 
(ACO....AC3) and a 16-bit transition register. The D Unit 
64 can also utilize the 1 6 bit pointer and data registers 
in the A Unit 62 as source or destination registers in ad- 
dition to the 40-bit accumulators. The D Unit register file 30 
90 receives data from the D Unit ALU 92 and MACs 1 &2 
96, 98 over accumulator write buses (ACWO, ACW1) 
100, 102, and from the D Unit shifter 94 over accumu- 
lator write bus (ACW1 ) 102. Data is read from the D Unit 
register file accumulators to the D Unit ALU 92, D Unit 35 
shifter 94 and MACs 1 &2 96, 98 over accumulator read 
busses (ACRO, ACR1) 104, 106. The D Unit ALU 92 
and D Unit shifter 94 are also coupled to sub-units of the 
A Unit 60 via various buses such as EFC, DRB, DR2 
and ACB for example, which will be described as and 40 
when necessary hereinafter. 

[0043] Referring now to Figure 12, there is illustrated 
an instruction buffer unit 58 in accordance with the 
present embodiment of the invention, comprising a 32 
word instruction buffer queue (113Q) 108. The I BQ 108 ^ 
comprises 32x16 bit registers 110, logically divided into 
8 bit bytes 1 1 2. Instructions arrive at the I BQ 1 08 via the 
32 bit program bus (PB) 114. The instructions are 
fetched in a 32 bit cycle into the location pointed to by 
the Local Write Program Counter (LWPC) 116. The so 
LWPC 116 is contained in a register located in the PU 
60. The P Unit 60 also includes 20 the Local Read Pro- 
gram Counter (LRPC) 118 register, and the Write Pro- 
gram Counter (WPQ) 120 and Read Program Counter 
(RPC) 1 22 registers. LRPC 1 1 8 points to th location in 55 
th IBQ 108 of the next instruction or instructions to be 
loaded into th instruction decoder/s 1 24 and 1 26. That 
is to say, the LRPC 1 1 4 points to th location in the IBQ 



108 of the instruction currently being dispatched to the 
decoders 124, 126. The WPC points to the addr ss in 
program memory of th start of th n xt 4 bytes of in- 
struction cod for the pip line. For each fetch into the 
IBQ the next 4 bytes from the program memory ar 
fetched regardless of instruction boundaries. The RPC 
122 points to the address in program memory of the in- 
struction currently being dispatched to the decoder/s 
124/126. 

[0044] In accordance with this embodiment, the in- 
structions are formed into a 48 bit word and are loaded 
into the instruction decoders 124, 126 over a 48 bit bus 
1 28 via multiplexors 1 30 and 1 32. It will be apparent to 
a person of ordinary skill in the art that the instruct ions 
may be formed into words comprising otherthan 48-bits, 
and that the present invention is not to be limited to the 
specific embodiment described above. 
[0045] The bus 1 28 can load a maximum of 2 instruc- 
tions, one per decoder, during any one instruction cycl . 
The combination of instructions may be in any combi- 
nation of formats, 8, 16, 24, 32, 40 and 48 bits, which 
will fit across the 48 1 0 bit bus. Decoder 1,124, is loaded 
in preference to decoder 2, 126, if only one instruction 
can be loaded during a cycle. The respective instruc- 
tions are then forwarded on to the respective function 
units in order to execute them and to access the data 
for which the instruction or operation is to be performed. 
Prior to being passed to the instruction decoders, the 
instructions are aligned on byte boundaries. 
[0046] The alignment is done based on the format de- 
rived for the previous instruction during decode thereof. 
The multiplexing associated with the alignment of in- 
structions with byte boundaries is performed in multi- 
plexor 130 and 132. 

[0047] In accordance with a present embodiment the 
processor core 46 executes instructions through a 7 
stage pipeline, the respective stages of which will now 
be described with reference to Figure 13. 
[0048] The first stage of the pipeline is a PRE-FETCH 
(PO) stage 134, during which stage a next program 
memory location is addressed by asserting an address 
on the address bus (PAB) 1 36 of a memory interface 48. 
[0049] In the next stage, FETCH (P1 ) stage 1 38, the 
program memory is read and the I Unit 58 is filled via 
the PB bus 1 40 from the memory interface unit 48. 
[0050] The PRE-FETCH and FETCH stages are sep- 
arate from the rest of the pipeline stages in that the pipe- 
line can be interrupted during the PRE-FETCH and 
FETCH stages to break the sequential program flow and 
point to other instructions in the program memory, for 
example for a Branch instruction. 
[0051] The next instruction in the instruction buffer is 
then dispatched to the decoder/s 124/126 in the third 
stage, DECODE (P2) 140, and the instruction decoded 
and dispatched to the execution unit for executing that 
instruction, for examp! the P Unit 60, the A Unit 62 or 
th D Unit 64. Th decode stage 1 40 includ s d coding 
at least part of an instruction including a first part indi- 
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eating the class of the instruction, a second part indicat- 
ing the format of the instruction and a third part indicating 
an addressing mode for th instruction. 
[0052] The n xt stage is an ADDRESS (P3) stage 
142, in which the address of the data to be used in the 
instruction is computed, or a new program address is 
computed should the instruction require a program 
branch or jump. Respective computations take place in 
the A Unit 62 or the P Unit 60 respectively. 
[0053] In an ACCESS (P4) stage 144 the address of 
a read operand is generated and the memory operand, 
the address of which has been generated in a DAGEN 
Y operator with a Ymem indirect addressing mode, is 
then READ from indirectly addressed Y memory 
(Ymem). 

[0054] The next stage of the pipeline is the READ (P5) 
stage 148 in which a memory operand, the address of 
which has been generated in a DAGEN X operator with 
an Xmem indirect addressing mode or in a DAGEN C 
operator with coefficient address mode, is READ. The 
address of the memory location to which the result of 
the instruction is to be written is generated. 
[0055] Finally, there is an execution EXEC (P6) stage 
150 in which the instruction is executed in either the A 
Unit 62 or the D Unit 64. The result is then stored in a 
data register or accumulator, or written to memory for 
Read/Modify/Write instructions. Additionally, shift oper- 
ations are performed on data in accumulators during the 
EXEC stage. 

[0056] The basic principle of operation for a pipeline 
processor will now be described with reference to Figure 
1 3. As can be seen from Figure 1 3, for a first instruction 
1 52, the successive pipeline stages take place over time 
periods T r T 7 . Each time period is a clock cycle for the 
processor machine clock. A second instruction 1 54, can 
enter the pipeline in period T 2 . since the previous in- 
struction has now moved on to the next pipeline stage. 
For instruction 3, 1 56, the PREFETCH stage 1 34 occurs 
in time period T 3 . As can be seen from Figure 13 for a 
seven stage pipeline a total of 7 instructions may be 
processed simultaneously. For all 7 instructions 
152-164, Figure 13 shows them all under process in 
time period T 7 . Such a structure adds a form of parallel- 
ism to the processing of instructions. 
[0057] As shown in Figure 14, the present embodi- 
ment of the invention includes a memory interface unit 
48 which is coupled to external memory units via a 24 
bit address bus 166 and a bi-directional 16 bit data bus 
168. Additionally, the memory interface unit 48 is cou- 
pled to program storage memory (not shown) via a 24 
bit address bus 1 36 and a 32 bit bi-directional data bus 
1 70. The memory interface unit 48 is also coupled to the 
I Unit 58 of the machine processor core 46 via a 32 bit 
program read bus (PB) 140. Th P Unit 60, A Unit 62 
and D Unit 64 ar coupled to th memory interfac unit 
48 via data read and data writ buses and corr spond- 
ing address buss s. Th P Unit 60 is further coupled to 
a program address bus 140. 



[0058] Mor particularly, the P Unit 60 is coupled to 
th memory interlace unit 48 by a 24 bit program ad- 
dress bus 140, th two 16 bit data write bus s (EB, FB) 
66, 68, and th two 16 bit data read bus s (CB, DB) 70, 

s 72. The A Unit 62 is coupled to the memory interface 
unit 48 via two 24 bit data write address buses (EAB, 
FAB) 172. 174, the two 16 bit data write buses (EB, FB) 
66, 68, the three data read address buses (BAB, CAB, 
DAB) 176, 178, 180 and the two 16 bit data read buses 

to (CB, DB) 70, 72. The D Unit 64 is coupled to the memory 
interface unit 48 via the two data write buses (EB, FB) 
66, 68 and three data read buses (BB, CB, DB) 1 82, 70, 
72. 

[0059] Figure 14 represents the passing of instruc- 
ts tions from the I Unit 58 to the P Unit 60 at 1 84, for for- 
warding branch instructions for example. Additionally, 
Figure 1 4 represents the passing of data from the I Unit 
58 to the A Unit 62 and the D Unit 64 at 186 and 188 
respectively. 

20 [0060] In accordance with a preferred embodiment of 
the invention, the processing engine is configured to re- 
spond to a local repeat instruction which provides for an 
iterative looping through a set of instructions all of which 
are contained in the Instruction Buffer Queue 108. The 

25 local repeat instruction is a 16 bit instruction and com- 
prises: an op-code; parallel enable bit; and an offset (6 
bits). 

[0061] The op-code defines the instruction as a focal 
instruction, and prompts the processing engine to ex- 
30 pect the offset and op-code extension. In the described 
embodiment the offset has a maximum value of 56, 
which defines the greatest size of the local loop as 56 
bytes of instruction code. 

[0062] Referring now to Figure 12, the IQB 108 is 64 

35 bytes long and can store up to 32x1 6 bit words. Instruc- 
tions are fetched into IQB 108 2 words at a time. Addi- 
tionally, the Instruction Decoder Controller reads a pack- 
et of up to 6 program code bytes into the instruction de- 
coders 1 24 and 1 26 for each Decode stage of the pipe- 

40 line. The start and end of the loop may fall at any of the 
byte boundaries within the 4 byte packet of program 
code fetched to the IQB 108. Thus, the start and end 
instructions are not necessarily co-terminus with the top 
and bottom of IQB 108. 

45 [0063] For example, in a case where the local loop 
instruction spans two bytes across the boundary of a 
packet of 4 program codes, both the packet of 4 program 
codes must be retained in the IQB 108 for execution of 
the local loop repeat. In order to take this into account 

so the local loop instruction offset is a maximum of 56 
bytes. 

[0064] When the local loop instruction is decoded the 
start address for the local loop, i.e., the address after 
the local instruction address, is stored in th Block Re- 
55 peat Start Addr ss0 (RSA0) r gist r which is located, 
for exampl , in th P unit 60. Th rep at start address 
also sets up the Read Program Counter (RPC). Th lo- 
cation of th nd of the local loop is comput d using the 
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offset, and th location is stored In the Block Repeat End 
Address 0 (REAq) register, which may also be located 
in the P unit 608, for xampl . Two r p at start addr ss 
regist rs and two repeat and address registers (RSAq, 
RSA V REAq, REA 1( ) are provided for nested loops. For 
nesting levels greater that two, preceding start/end ad- 
dresses are pushed to a stack register. 
[0065] During the first iteration of a local loop, the pro- 
gram code for the body of the loop is loaded into the IBQ 
108 and executed as usual. However, for the fol towing 
iterations no fetch will occur until the last iteration, during 
which the fetch will restart. 

[0066] FIG. 15 illustrates a technique for coupling 
multiple access memory 26 to memory interface unit 48. 
Incorporation of the aforementioned self -timing archi- 
tecture and multiple-access memory wrappers, such as 
with the processor described above, does away with cal- 
ibration problems typically encountered when attempt- 
ing several accesses to a memory core in one clock cy- 
cle. The self-timing logic facilitates a full dissociation be- 
tween environment (what is clocked on the system 
clock) and the access to the core. Moreover, a direct 
application facilitates accesses at the speed of the 
memory core to process several accesses in one sys- 
tem clock cycle. 

[0067] Optionally, multiple access memory 26 can al- 
so be incorporated directly into the processor core, as 
illustrated in FIG. 16. Placing multiple access memory 
26 into the processor core facilitates more intense ac- 
cessing power savings since the memory wrapper and 
the additional time required accessing memory interface 
48 (via memory interface unit 48), are eliminated. 
[0068] Another example of a VLSI integrated circuit 
into which memory wrapper 28 and memory core 30 ac- 
cording to the preferred embodiment of the invention 
may be implemented is illustrated in FIG. 17. The archi- 
tecture illustrated in FIG. 17 for DSP 190 is presented 
by way of example, as it will be understood by those of 
ordinary skill in the art that the present invention may be 
implemented into integrated circuits of various function- 
ality and architecture, including custom logic circuits, 
general purpose microprocessors, and other VLSI and 
larger integrated circuits. 

[0069] DSP 190 in this example is implemented by 
way of a modified Harvard architecture, and as such uti- 
lizes three separate data buses C, D, E that are in com- 
munication with multiple execution units including expo- 
nent unit 192, multiply/add unit 194, arithmetic logic unit 
(ALU) 196, and barrel shifter 198. Accumulators 200 
permit operation of multiply/add unit 1 94 in parallel with 
ALU 196, allowing simultaneous execution of multiply- 
accumulate (MAC) and arithmetic operations. The in- 
struction set executable by DSP 190, in this example, 
includes single-instruction repeat and block repeat op- 
erations, block memory move instructions, two and 
three operand reads, conditional stor operations, and 
parallel load and store op rations, as well as dedicated 
digital signal processing instructions. DSP 190 also in- 



cludes compare, s lect, and stor unit (CSSU) 202, cou- 
pled to data bus E, for acc leratingVit rtoi computation, 
as useful in many conventional communication algo- 
rithms. 

5 [0070] DSP 190 in this example includes significant 
on-chip memory resources, to which access is control- 
led by memory/peripheral interface unit 204, via data 
buses C, D, E, and program bus R These on-chip mem- 
ory resources include random access memory (RAM) 

10 206, read-only memory (ROM) 208 used for storag of 
program instructions, and data registers 210; program 
controller and address generator circuitry 21 2 is also in 
communication with memory/peripheral interface 204, 
to effect its functions. Interface unit 214 is also provided 

is in connection with memory/peripheral interface to con- 
trol external communications, as do serial and host ports 
216. Additional control functions such as timer 218 and 
JTAG test port 220 are also included in DSP 1 90. 
[0071 ] According to this preferred embodiment of the 

20 invention, the various logic functions executed by DSP 
1 90 are effected in a synchronous manner, according to 
one or more internal system clocks generated by PLL 
clock generator 222, constructed as described herein- 
above. In this exemplary implementation, PLL clock 

25 generator 222 directly or indirectly receives an external 
clock signal on line REFCLK, such as is generated by 
other circuitry in the system or by a crystal oscillator or 
the like, and generates internal system clocks, for ex- 
ample the clock signal on line OUTCLK, communicated 

30 (directly or indirectly) to each of the functional compo- 
nents of DSP 190. 

[0072] DSP 190 also includes power distribution cir- 
cuitry 224 for receiving and distributing the power supply 
voltage and reference voltage levels throughout DSP 

35 190 in the conventional manner. As indicated in Figure 
17, DSP 1 90 according to the preferred embodiment of 
the present invention may be powered by extremely low 
power supply voltage levels, such as on the order of 1 
volt. This reduced power supply voltage is of cours 

40 beneficial in maintaining relatively low power dissipation 
levels, and is in large part enabled by the construction 
and operation of PLL clock generator 222, which gen- 
erates stable and accurate internal clock signals even 
with such low power supply voltages. In this emobid- 

45 ment of the invention, multiple access memory 26 is part 
of RAM 206, which means it is included in the processor 
core. Incorporation of multiple access memory 26 into 
the processor core facilitates increased accessing of th 
memory core and power savings since memory wrapper 

50 since memory wrapper 28 is eliminated and memory in- 
terface unit 48 is not used as an interface between the 
processing engine and the multiple access memory 26. 
[0073] Referring now to FIG. 18, an example of an 
electronic computing system constructed according to 

55 the pr f rred embodim nt of the present invention will 
now be described in d tail. Specifically, Figur 18 illus- 
trat s th construction of a wirel ss communications 
system, namely a digital c llular telephon hands t200 
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constructed according to th pref rred embodiment of 
the inv ntion. It is contemplated, of course, that many 
other types of communications systems and comput r 
systems may also benefit from th present invention, 
particularly those relying on battery power. Examples of 5 
such other computer systems include personal digital 
assistants (PDAs), portable computers, and the like. As 
power dissipation is also of concern in desktop and line- 
powered computer systems and microcontroller appli- 
cations, particularly from a reliability standpoint, it is also 
contemplated that the present invention may also pro- 
vide benefits to such line-powered systems. 
[0074] Handset 226 includes microphone M for re- 
ceiving audio input, and speaker S for outputting audible 
output, in the conventional manner. Microphone M and 
speaker S are connected to audio interface 228 which, 
in this example, converts received signals into digital 
form and vice versa. In this example, audio input re- 
ceived at microphone M is processed by filter 230 and 
analog-to-digital converter (ADC) 232. On the output 
side, digital signals are processed by digitaUo-anatog 
converter (DAC) 234 and filter 236, with the results ap- 
plied to amplifier 238 for output at speaker S. 
[0075] The output of ADC 232 and the input of DAC 
234 in audio interface 228 are in communication with 
digital interface 240. Digital interface 240 is connected 
to microcontroller 242 and to digital signal processor 
(DSP) 190 (alternatively, DSP 40 of FIG. 9 could also 
be used in lieu of DSP 190), constructed as described 
hereinabove relative to Figure 1 5, by way of separate 
buses in the example of FIG. 16. 
[0076] Microcontroller 242 controls the general oper- 
ation of handset 226 in response to input/output devices 
244, examples of which include a keypad or keyboard, 
a user display, and add-on cards such as a SIM card. 
Microcontroller 242 also manages other functions such 
as connection, radio resources, power source monitor- 
ing, and the like. In this regard, circuitry used in general 
operation of handset 226, such as voltage regulators, 
power sources, operational amplifiers, clock and timing 
circuitry, switches and the like are not illustrated in FIF. 
16 for clarity; it is contemplated that those of ordinary 
skill in the art will readily understand the architecture of 
handset 226 from this description. 
[0077] In handset 226 according to the preferred em- 
bodiment of the invention, DSP 1 90 is connected on one 
side to interface 240 for commun ication of signals to and 
from audio interface 228 (and thus microphone M and 
speaker S), and on another side to radio frequency (RF) 
circuitry 246, which transmits and receives radio signals 
via antenna A. Conventional signal processing per- 
formed by DSP 1 90 may include speech coding and de- 
coding, error correction, channel coding and decoding, 
equalization, demodulation, encryption, voic dialing, 
echo cancellation, and oth r similar functions to b per- 
formed by hands 1 1 90. 

[0078] RF circuitry 246 bidirectionalty communicates 
signals between ant nna A and DSP 1 90. For transmis- 



sion, RF circuitry 246 includes codec 248 which cod s 
the digital signals into the appropriate form for applica- 
tion to modulator 250. Modulator 250, in combination 
with synth sizer circuitry (not shown), generat s mod- 
ulated signals corresponding to the coded digital audio 
signals; driver 252 amplifies the modulated signals and 
transmits the same via antenna A. Receipt of signals 
from antenna A is effected by receiver 254, which ap- 
plies the received signals to codec 248 for decoding into 
digital form, application to DSP 190. and eventual com- 
munication, via audio interface 228, to speaker S. 
[0079] The scope of the present disclosure includes 
any novel feature or combination of features disclosed 
therein either explicitly or implicitly or any generalization 
thereof irrespective of whether or not it relates to the 
claimed invention or mitigates any or all of the problems 
addressed by the present invention. The applicant h re- 
by gives notice that new claims may be formulated to 
such features during the prosecution of this application 
or of any such further application derived therefrom. In 
particular, with reference to the appended claims, fea- 
tures from dependant claims may be combined with 
those of the independent claims in any appropriate man- 
ner and not merely in the specific combinations enumer- 
ated in the claims. 
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30 1. A method of accessing a memory core more than 
once in a single clock cycle. 

2. The method of Claim 1, wherein said more than 
once is twice. 

35 

3. The method of Claim 1, wherein said more than 
once is at least three times. 

4. The method of Claim 2, wherein said memory core 
40 is incorporated into a dual-access RAM. 

5. The method of any one of the previous claims 
wherein self-timing logic is used to facilitate access- 
ing a memory core more than once in a single clock 

45 cycle. 

6. The method of Claim 5, wherein said self-timing log- 
ic is implemented in a memory wrapper coupled to 
said memory core. 

so 

7. The method of Claim 6, wherein said memory wrap- 
per couples said memory core to a memory inter- 
face unit. 

55 8. Th method of any on of th previous claims 
wh rein said memory cor is part of a proc ssing 
engine. 
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9. The method of any one of the pr vious claims 
wherein said memory core is a single access mem- 
ory core. 

10. The method of any one of the previous claims s 
wherein said memory core is part of the processor 
core. 

11. An electronic device, comprising: 

10 

a memory core; and 

circuitry coupleable to said memory core for ac- 
cessing said memory core more than once in a 
single clock cycle. 

15 

12. The device of Claim 11, wherein said memory core 
is part of a dual-access RAM. 

13. The device of Claim 11 , wherein said memory core 
and said circuitry combine to form a multiple access 20 
memory core. 

14. The device of Claim 11 wherein said circuitry is em- 
bodied in an electronic device coupling a memory 
interface unit to said memory core, 25 

1 5. The device of any one of Claims 11-14, wherein said 
memo'ry interface unit couples a central processing 
unit to said electronic device which couples said 
memory interface unit to said memory core 30 

1 6. The device of any one of Claims 11-14 wherein said 
electronic device is a digital signal processor. 

17. The device of any one of Claims 11-14 wherein said 3$ 
memory core is part of the processor core. 

18. An electronic system, comprising: 

at least one input/output device; and *o 
an integrated circuit, coupled to the at least one 
input/output device, and comprising: 

functional circuitry, for executing logical op- 
erations upon digital data signals in a syn- 45 
chronous fashion according to an internal 
clock signal; 

power distribution circuitry, coupled to a 
battery, for distributing power to the func- 
tional circuitry; and so 
circuitry coupled to a memory core in said 
integrated circuit for accessing said mem- 
ory core more than once in a single clock 
cycl . 

55 
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